Unsupervised Keyword Extraction from Polish Legal Texts

نویسندگان

  • Michal Jungiewicz
  • Michal Lopuszynski
چکیده

In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only languagespecific input is a stoplist containing a set of non-content words. The performance of the method heavily depends on the choice of such a stoplist, which should be domain adopted. Therefore, we complement RAKE algorithm with an automatic approach to selecting non-content words, which is based on the statistical properties of term distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Topic Models to Judgments from Public Procurement Domain

[4] M. Jungiewicz, M. Łopuszyński, Unsupervised keyword extraction from Polish legal texts, In Advances in NLP, 65–70, Springer (2014) [5] M. Łopuszyński, Ł. Bolikowski, Towards robust tags from NLP tools and Wikipedia, Int. Journal of Digit. Libraries (2015) (available online) I acknowledge the support from the SAOS project financed by the National Centre for Research and Development. I acknow...

متن کامل

TextRank: Bringing Order Into Texts

In this paper, we introduce TextRank – a graph-based ranking model for text processing, and show how this model can be successfully used in natural language applications. In particular, we propose two innovative unsupervised methods for keyword and sentence extraction, and show that the results obtained compare favorably with previously published results on established benchmarks.

متن کامل

Keyword Extraction for Text Characterization

Keywords are valuable means for characterizing texts. In order to extract keywords we propose an efficient and robust, language-and domain-independent approach which is based on small word parts (quadgrams). The basic algorithm can be improved by reexamining and re-ranking keywords using edit distance (i.e. Levenshtein distance) and an algorithm based on the relativistic addition of velocities ...

متن کامل

Resources for Information Extraction from Polish texts

The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.

متن کامل

Keyword extraction: a review of methods and approaches

Paper presents a survey of methods and approaches for keyword extraction task. In addition to the systematization of methods, the paper gathers a comprehensive review of existing research. Related work on keyword extraction is elaborated for supervised and unsupervised methods, with special emphasis on graphbased methods as well as Croatian keyword extraction. Selectivity-based keyword extracti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014